L1 & L2 Regularization effect on models weights

The following histograms display the weights of the 3 layers in the CNN. We can see the effect of L1 and L2 regularization by observing how the weights change over time.

Mario da Graca

Created on January 18|Last edited on March 19

Comment

﻿
BaselineAs expected we can see that the Baseline Model with no regularization contains a wide range of values between [1, -2.5].
﻿
parameters/fc2.weight
parameters/fc2.weight
20406080100Step-2-101
20406080100Step-2-101
parameters/fc1.weight
parameters/fc1.weight
20406080100Step-2-1.5-1-0.500.51
20406080100Step-2-1.5-1-0.500.51
parameters/conv1.weight
parameters/conv1.weight
20406080100Step-1-0.500.51
20406080100Step-1-0.500.51
Run: Baseline1
﻿
L1 RegularizationWe can immediately see how L1 regularization eliminates features in the model. The histograms show a lot narrower density distribution around 0 compared to the baseline. Especially the feature selection is visible by observing how narrow the peak of the weight distribution is getting. The range of the values remains [1, -2].
﻿
Run: L11
﻿
L2 RegularizationThe heavy feature shrinkage of L2 regularization is perfectly visible when comparing the histograms with the above mentioned ones. The range of the weights shrinks down from [1, -2] to [0.6, -1.5]
﻿
Run: L22
﻿
﻿

Add a comment